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Abstract 

Approximately 15-30% of all breast cancer tumors are estrogen receptor negative (ER-). Compared with ER-positive (ER+) 
disease they have an earlier age at onset and worse prognosis. Despite the vast number of risk variants identified for 
numerous cancer types, only seven loci have been unambiguously identified for ER-negative breast cancer. With the aim of 
identifying new susceptibility SNPs for this disease we performed a pleiotropic genome-wide association study (GWAS). We 
selected 3079 SNPs associated with a human complex trait or disease at genome-wide significance level (P<5x10 -8 ) to 
perform a secondary analysis of an ER-negative GWAS from the National Cancer Institute's Breast and Prostate Cancer 
Cohort Consortium (BPC3), including 1998 cases and 2305 controls from prospective studies. We then tested the top ten 
associations (i.e. with the lowest P-values) using three additional populations with a total sample size of 3509 ER+ cases, 
2543 ER- cases and 7031 healthy controls. None of the 3079 selected variants in the BPC3 ER-GWAS were significant at the 
adjusted threshold. 186 variants were associated with ER— breast cancer risk at a conventional threshold of P<0.05, with P- 
values ranging from 0.049 to 2.3x10~ 4 . None of the variants reached statistical significance in the replication phase. In 
conclusion, this study did not identify any novel susceptibility loci for ER-breast cancer using a "pleiotropic approach". 

Citation: Campa D, Barrdahl M, Tsilidis KK, Severi G, Diver WR, et al. (2014) A Genome-Wide "Pleiotropy Scan" Does Not Identify New Susceptibility Loci for 
Estrogen Receptor Negative Breast Cancer. PLoS ONE 9(2): e85955. doi:10.1371/journal.pone.0085955 

Editor: Paolo Peterlongo, IFOM, Fondazione Istituto FIRC di Oncologia Molecolare, Italy 

Received October 25, 2013; Accepted December 4, 2013; Published February 11, 2014 

Copyright: © 2014 Campa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits 
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 



PLOS ONE | www.plosone.org 



1 



February 2014 | Volume 9 | Issue 2 | e85955 



ER-Pleiotropy Scan 



Funding: This work was supported by the US National Institutes of Health, National Cancer Institute (cooperative agreements U01-CA98233-07 to D.J.H.; U01- 
CA98710-06 to M.J.T.; U01-CA98216-06 to E.R. and R.K.; and U01-CA98758-07 to B.E.H.); and Intramural Research Program of National Institutes of Health and National 
Cancer Institute, Division of Cancer Epidemiology and Genetics. The funders had no role in study design, data collection and analysis, decision to publish, or 
preparation of the manuscript. 

Competing Interests: The authors have declared that no competing interests exist. 
* E-mail: d.campa@dkfz.de 



Introduction 

Estrogen receptor-negative (ER— ) breast cancer (BC) comprises 
15 to 30% of all breast tumours (depending on the population) and 
has an earlier age at onset and a worse prognosis compared with 
estrogen receptor-positive (ER+) disease. It is more common 
among women of African- American origin and it is also the breast 
cancer type associated with BRCA1 mutations [1,2]. Genome-wide 
association studies (GWAS) have identified thousands of common 
human genetic variants associated with risk of hundreds of 
quantitative traits and human diseases [3,4]. Only seven suscep- 
tibility loci have been specifically identified for ER— BC [5-7] . In 
a GWAS, hundreds of thousands or even millions of polymor- 
phisms are interrogated at the same time in a strictly agnostic way, 
i.e. ignoring any possible a prion knowledge of the SNPs tested. 
This model requires use of a stringent significance threshold 
(P<5 x 10~ s ) to correct for the numerous statistical tests performed 
and to avoid false positive findings. As a consequence, it is possible 
that variants with a truly positive but weak association are not 
detected and, therefore, not reported. A possible drawback of 
GWAS is that strict avoidance of false positives may lead to false 
negatives [8]. By running secondary analyses using a reduced 
number of SNPs defined by biological knowledge or hypothesis, 
the required threshold of significance may be lowered and the 
power to detect real associations of modest statistical effect may be 
increased. 

A genetic mechanism termed pleiotropy, which is defined as one 
gene, or in this case allele, having an effect on multiple phenotypes 
[9] is an example for the selection of candidate SNPs for such 
secondary analysis. There are regions in the human genome, 
called Nexus, which have been associated with more than one 
distinct cancer type [10]. The most striking examples for cancer 
are: the 8q24 region, that harbors multiple loci associated with 
breast, colon, prostate, bladder and/ or ovarian cancers, the TERT 
region, which has been associated with pancreatic, bladder, lung 
and prostate cancers, the pi 6 region on chromosome 9p21, and 
6q25, and llql3 associated, respectively, with non-Hodgkins 
lymphoma (NHL) and nasopharyngeal carcinoma and with 
bladder, breast and prostate cancer [10]. To the best of our 
knowledge a pleiotropic approach to identify novel cancer risk 
SVPr has been reported only once [11]. A pleiotropic GWAS 
performed to examine gene regions associated with pancreatic 
cancer, identified a region (HNF1A) previously associated with 
several diseases including Type-2 diabetes [12,13]. 

We used a similar approach to search for new genetic variants 
associated with estrogen receptor negative breast cancer suscep- 
tibility. We selected all the SNPs that had been associated with a 
human disease trait or phenotype, at genome-wide level 
(P<5xl0~ ) and performed a secondary analysis on data from a 
GWAS study of ER— breast cancer by the National Cancer 
Institute's Breast and Prostate Cancer Cohort Consortium (BPC3) 
[7]. We then tested the top associations using three additional 
populations with a total sample size of 3509 ER+ cases, 2543 ER— 
cases and 703 1 healthy controls. 



Materials and Methods 

Ethic statement 

The Mammary Carcinoma Risk Factor Investigation (MARIE) 
study was approved by the ethics committees of the University of 
Heidelberg and the University of Hamburg. Written informed 
consent was obtained from all subjects. 

For the BPC3 study written informed consent was obtained 
from all subjects and ethical approval was collected from the 
relevant institutional review boards from each cohort. The cohorts 
are: the European Prospective Investigation into Cancer and 
Nutrition (EPIC), the Melbourne Collaborative Cohort Study 
(MCCS), the Nurses' Health Study (NHS), the American Cancer 
Society Cancer Prevention Study II (CPS-II), the Prostate, Lung, 
Colorectal, Ovarian Cancer Screening Trial (PLCO), and the 
Multiethnic Cohort (MEC) 

Study populations 

We performed the study in two phases: first we analysed data 
from the BPC3 ER— GWAS and second, for replication purposes, 
we used genotyping or existing data from selected breast cancer 
cases and controls collected by three different studies CPS-II, 
MCCS and the MARIE study. Individuals from CPS-II contrib- 
uted cases and controls to both the initial GWAS and the 
replication phase, but there were no overlaps between sample sets 
used in the two phases of this study. 

The BPC3 has been described extensively elsewhere [14]. It 
consists of cases and controls selected from large cohorts assembled 
in Europe, Australia and the United States that have both 
biological samples and extensive questionnaire information 
collected prospectively. Cases were women who were diagnosed 
with invasive BC after enrolment, the diagnosis was confirmed by 
tumor registries or by medical records. Controls were considered 
eligible if they were free of BC until the follow-up time for the 
matched case subject. Case and control subjects were matched for 
ethnicity and age and for some cohorts also for additional criteria, 
such as country of residence. Laboratory techniques and relevant 
QCs for the BPC3 ER- GWAS are extensively reported 
elsewhere [7] . Briefly, genotyping was performed at three centers 
(Imperial College London, UK, University of Southern California, 
USA, and the NCI Core Genotyping Facility, USA). Subjects from 
CPSII, EPIC, MEC, PLCO and PBCS were genotyped using the 
Illumina Human 660k-Quad SNP array (Illumina, San Diego, 
CA, USA), NHSI/NHSII and part of the PLCO study were 
genotyped previously using the Illumina Human 550 SNP array 
(Illumina, San Diego, CA, USA) [15]. For this study 1998 ER- 
invasive cancer cases and 2305 controls belonging to the BPC3 
cohort were used. 

The MARIE study population comprises BC patients who 
participated in a population-based case-control study conducted in 
two study regions in Germany (Hamburg and Rhine-Neckar- 
Karlsruhe). Cases were women diagnosed with histologically 
confirmed primary invasive or in situ breast tumor, aged 50 to 
74 years, and residents of the study regions. Detailed information 
on tumor hormone receptor status was collected using clinical and 
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Figure 1. Manhattan Plot of all SNPs analyzed in phase one of the study. 

doi:10.1371/journal.pone.0085955.g001 



pathology records. Controls were randomly selected from popu- 
lation registries and frequency-matched by year of birth and study 
region. The study has been described in more detail elsewhere 
[16]. For the present analyses, 2027 cases (370 ER-/1657 ER+) 
and 1778 controls were included. 

SNPs selection (phase one) and genotyping 

The selection of the SNPs to be measured in phase one was 
done using the National Human Genome Research Institute's 
(NHGRI's) catalog of published GWA studies (http://www. 
genome.gov/gwastudies/) [4]. It contains summary information 
on polymorphic variants reported to be associated with a human 
disease, trait or phenotype in a GWA setting at the significance 
level of P< 1.0x10 . The data from the catalogue were 
downloaded in May 2012 and comprised 7986 SNPs. Approxi- 
mately 60% (n = 5794) of the polymorphic variants reported in the 
catalogue had a P value higher than 5x10 8 and were, therefore, 
excluded from further analysis. Of the remaining 3192 SNPs, 1688 
(58%) were genotyped in the BPC3 scan. PLINK [17] was used to 
identify highly correlated (r 2 >0.9 in Hapmap3 CEU) SNPs 
genotyped in the BPC3 GWAS for 452 variants (14.2% of the total 
selected SNPs). Data for 939 SNPs were imputed: 901 (28.3% of 
the total selected SNPs) from Hapmap 2 and 38 (1.1% of the total 
selected SNPs) from Hapmap3. The remaining 113 (3.6% of the 
total selected SNPs) variants were dropped from the analysis since 
no surrogate was found and it was not possible to impute data. 
Thus, data for 3079 out of 3192 catalogued SNPs (96.4%) were 
used for this study. 

The 3079 remaining SNPs were looked up in the BPC3 GWAS 
ranking the P-value in decreasing order to check for their 
association. All already known breast cancer risk SNPs were 
excluded from the analysis. 

Replication (phase two) genotyping 

In order to confirm the ten most significant findings we used 
additional BC cases and controls from three studies of women of 
Caucasian descent as a replication set: the CPS-II [18] consisting 
of 1530 estrogen receptor positive (ER+) cases, 53 ER— cases and 
2395 healthy controls, the MCCS [19] with 322 ER+ cases, 122 



ER— cases and 823 healthy controls, and the MAmmary 
carcinoma Risk factor InvEstigation (MARIE) [16] with 1657 
ER+ cases, 370 ER— cases and 1778 healthy controls, for a total 
of 3684 cases and 4996 controls. Specifically rs498872, rs2000999, 
rsl2150660, rs780094, rsl 1229030 and rsl3397985 were repli- 
cated in silico for the MARIE, CPSII and MCCS studies. These six 
SNPs were genotyped as part of the iCOGS study using a custom 
Illumina array. In the original iCOGS publications SNPs with 
MAF <1%; caU rate <95%; or call rate <99% and MAF <5% 
and all SNPs with genotype frequencies that departed from 
Hardy- Weinberg equilibrium at P<lxl0 -6 for controls or 
P< 1x10" 12 for cases were excluded [5,20]. The remaining four 
SNPs rs8396, rs4788815, rs2571391, rs780092 were not present in 
the iCOGS array and were, therefore, genotyped de novo for the 
MARIE study by TaqMan. The mean genotyping success rate was 
94.4% (88.2%-96.7%). The percentage of samples that was 
genotyped twice for quality assurance was 9.5%, the genotyping 
concordance was 99.99%. Departure from Hardy Weinberg 
equilibrium was tested for the ten SNPs for the respective control 
subjects from each study. 

Statistical analysis 

Logistic regression adjusted for five principal components, age 
(at diagnosis for cases and at selection for controls) and cohort was 
used to generate ORs, 95% CIs, and P values for each of the 3079 
SNPs selected from the BPC3 ER negative GWA data set and for 
the 10 SNPs in the replication phase. The replication was 
performed using ER— and ER+ breast cancer cases and the 
analysis was conducted using ER— alone and in combination with 
ER+. Considering the fact that several ER— SNPs are also 
associated with ER+ BC we included in the analysis ER+ and 
ER— cases and then analyzed overall BC risk (ER+ and ER— ) 
and ER— specific (ER— alone) to increase our power to find a true 
association. We had more than 90% power to replicate any of the 
associations observed in the discovery phase if considering all BC 
cases, and over 50% (53%— 72%) power if considering only ER— 
cases considering alpha of 0.05. Using a conservative Bonferroni 
correction, we considered a threshold of P-values< 1.6x10 
(0.05/3079) as statistically significant. 
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Table 1. The strongest associations between the pleiotropic SNPs and breast cancer risk. 





SNP Name 


study 


ER status 


OR" 


95% Cl b 


P_trend c 


study 


ER status 


OR 


95%CI 


P_trend 


rs2000999 [22] 


BPC3 


ER+/ER- 


0.83 


(0.75-0.93) 


6.72E-04 


BPC3 


ER- 


0.83 


(0.75-0.93) 


6.72E-04 




MCCS 


ER+/ER- 


1.05 


(0.85-1 .30) 


6.30E-01 


MCCS 


ER- 


1.14 


(0.80-1.62) 


4.80E-01 




CPS2 


ER+/ER- 


0.94 


(0.85-1.04) 


2.41 E-01 


CPS2 


ER- 


0.88 


(0.53-1.46) 


6.19E-01 




MARIE 


ER+/ER- 


1.02 


(0.91-1.15) 


7.26E-01 


MARIE 


ER- 


0.99 


(0.81-1.20) 


8.85E-01 


rsl 21 50660 [23] 


BPC3 


ER+/ER- 


0.84 


(0.76-0.93) 


8.58E-04 


BPC3 


ER— 


0.84 


(0.76-0.93) 


8.58E-04 




MCCS 


ER+/ER- 


1.00 


(0.82-1.21) 


9.70E-01 


MCCS 


ER- 


0.79 


(0.56-1.11) 


1.70E-01 




CPS2 


ER+/ER- 


0.94 


(0.84-1.05) 


2.74E-01 


CPS2 


ER— 


1.21 


(0.74-1.98) 


4.53E-01 




MARIE 


ER+/ER- 


1.16 


(1.04-1.31) 


1.07E-02 


MARIE 


ER- 


1.18 


(0.97-1.42) 


9.87E-02 


rsl 3397985 [24] 


BPC3 


ER+/ER- 


0.84 


(0.76-0.94) 


2.18E-03 


BPC3 


ER- 


0.84 


(0.76-0.94) 


2.18E-03 




MCCS 


ER+/ER- 


1.09 


(0.88-1 .35) 


4.20E-01 


MCCS 


ER- 


0.95 


(0.66-1.38) 


8.00E-01 




CPS2 


ER+/ER- 


0.94 


(0.82-1.08) 


3.58E-01 


CPS2 


ER- 


1.16 


(0.65-2.07) 


6.17E-01 




MARIE 


ER+/ER- 


1.11 


(0.98-1 .26) 


8.65E-02 


MARIE 


ER- 


1.16 


(0.95-1.42) 


1.46E-01 


rs780094 [25] 


BPC3 


ER+/ER- 


0.87 


(0.80-0.94) 


9.97 E-04 


BPC3 


ER- 


0.87 


(0.80-0.94) 


9.97E-04 




MCCS 


ER+/ER- 


1.01 


(0.86-1.19) 


8.70E-01 


MCCS 


ER- 


0.94 


(0.72-1.24) 


6.80E-01 




CPS2 


ER+/ER- 


0.96 


(0.88-1 .04) 


3.00E-01 


CPS2 


ER- 


0.98 


(0.67-1.45) 


9.29E-01 




MARIE 


ER+/ER- 


1.02 


(0.92-1.12) 


7.38E-01 


MARIE 


ER- 


0.97 


(0.83-1.14) 


7.34E-01 


rsl 1229030 [26] 


BPC3 


ER+/ER- 


0.87 


(0.80-0.95) 


1.95E-03 


BPC3 


ER- 


0.87 


(0.80-0.95) 


1 .95E-03 




MCCS 


ER+/ER- 


0.98 


(0.83-1.15) 


8.00E-01 


MCCS 


ER- 


1.01 


(0.77-1.33) 


9.20E-01 




CPS2 


ER+/ER- 


1.02 


(0.94-1.11) 


6.60E-01 


CPS2 


ER- 


1.31 


(0.89-1.94) 


1.72E-01 




MARIE 


ER+/ER- 


0.98 


(0.89-1 .08) 


6.22E-01 


MARIE 


ER- 


0.85 


(0.72-1.00) 


4.94E-02 


rs780092 [27] 


BPC3 


ER+/ER- 


1.20 


(1.07-1.34) 


2.06E-03 


BPC3 


ER- 


1.20 


(1.07-1.34) 


2.06E-03 




MARIE 


ER+/ER- 


1.03 


(0.94-1.14) 


4.96E-01 


MARIE 


ER- 


0.90 


(0.74-1.10) 


3.17E-01 


rs4788815 [28] 


BPC3 


ER+/ER- 


0.85 


(0.78-0.93) 


5.29E-04 


BPC3 


ER- 


0.85 


(0.78-0.93) 


5.29E-04 




MARIE 


ER+/ER- 


0.99 


(0.92-1 .06) 


6.94E-01 


MARIE 


ER- 


1.14 


(0.99-1.31) 


7.50E-02 


rs2571391 [29] 


BPC3 


ER+/ER- 


1.16 


(1.06-1.27) 


1.42E-03 


BPC3 


ER— 


1.16 


(1.06-1.27) 


1 .42E-03 




MARIE 


ER+/ER- 


1.03 


(0.95-1.10) 


4.82 E-01 


MARIE 


ER- 


0.91 


(0.79-1.06) 


2.25E-01 


rs498872 [30] 


BPC3 


ER+/ER- 


1.15 


(1.05-1.26) 


2.21 E-03 


BPC3 


ER- 


1.15 


(1.05-1.26) 


2.21 E-03 




MCCS 


ER+/ER- 


0.96 


(0.80-1.15) 


6.60E-01 


MCCS 


ER- 


1.05 


(0.78-1.43) 


7.30E-01 




CPS2 


ER+/ER- 


0.92 


(0.85-1.01) 


7.10E-02 


CPS2 


ER- 


0.96 


(0.63-1.47) 


8.50E-01 




MARIE 


ER+/ER- 


1.01 


(0.91-1.12) 


9.07 E-01 


MARIE 


ER- 


0.96 


(0.80-1.14) 


6.11 E-01 



a OR = Odds Ratio. 

b 95% CI = 95% Confidence Intervals. 

C AII analysis were adjusted for age at diagnosis and in the BPC3 for cohort of provenience. 
doi:1 0.1 371 /joumal.pone.0085955.t001 



Results 

None of the 3079 selected variants in the BPC3 ER-GWAS was 
significant at the adjusted threshold. 186 variants were associated 
with ER— breast cancer risk at a conventional threshold of 
P<0.05, with P values ranging from 0.049 to 2.3 x 10" 4 (Figure 1). 
The strongest observed association was a decreased risk of ER— 
BC with rs8396 (OR heter( , :0.84; 95% CI 0.76-0.92 and 
ORhom„0.71 (CI 95% 0.58-0.85)). We selected the most significant 
10 SNPs (shown in table 1) and analyzed them using independent 
samples to determine whether they were genuinely associated with 
BC overall and for ER— breast cancer in particular. All the 
polymorphic variants were in Hardy- Weinberg equilibrium with 
the exception of rsl 2 150660 in the CPSII and MARIE cohorts 
and rsl 3397985 in the CPSII cohort. Therefore, CPSII was not 
used as a replication set for rsl2150660 and rsl 3397985 and 
MARIE was not used for rsl 2 150660. In addition, one 
polymorphic variant rs8396 was not used in the analysis because 
it had a call rate lower than 95% (88.2%). 



Only rsl 1229030, a variant originally found associated with risk 
of Crohn's disease, was nominally associated with a decreased risk 
of ER- BC (OR 0.85, CI 95% 0.75-1.00, P value = 0.049). The 
association was observed only for the MARIE study. The results of 
all the analyses are shown in table 1. Additional information on 
the original reports can be found at http://www.genome.gov/ 
gwastudies/. We also performed meta-analysis between the 
various studies but the results were very heterogeneous, clearly 
suggesting a negative finding (Forest plots, heterogeneity P-values 
and I 2 statistics are shown in figure SI). 

Discussion 

Pleiotropy is a fairly common phenomenon that is defined as 
one gene or allelic variant having an effect on multiple phenotypes. 
In a recent paper based on data from the catalogue of published 
GWAS, Sivakumaran and colleagues have reported that 4.6% of 
the SNPs and 16.9% of the genes present in the catalogue are 
shown to have pleiotropic effects [9]. These percentages probably 



PLOS ONE | www.plosone.org 



4 



February 2014 | Volume 9 | Issue 2 | e85955 



ER-Pleiotropy Scan 



underestimate the real biological significance, since they have been 
obtained using a very conservative threshold, such as considering 
only the SNPs available in the catalogue and associated with a 
particular disease or trait at a genome wide level. Using data from 
GWAS meta-analyses, pleiotropy seems to play a much stronger 
role for specific diseases, for example Cotsapas and collaborators 
reported that 44% of the susceptibility loci for autoimmune 
diseases overlap [2 1] . In a two-staged analysis of 3509 ER+ cases, 
2543 ER— and 7031 healthy controls, none of the SNPs showed a 
statistically significant association with breast cancer in the 
replication analysis. The strongest signal, in the replication 
analysis, was given by rsl 1229030 (a Crohn's disease susceptibility 
allele) that was associated with a decreased risk of ER— BC (P 
value = 0.049) only in the MARIE study, but not in CPS-II or 
MCCS suggesting that the association found is probably due to 
chance. 

There are several possible reasons why our pleiotropic approach 
failed to identify new SNP associatated with ER— BC. First, ER— 
BC may be associated with uncommon biologic pathways that are 
not shared with many other diseases and, therefore, may not be 
influenced by pleiotropy. This is consistent with the fact that there 
are several SNPs which are specifically associated with ER— but 
not ER+. Alternatively, ER— BC may share genetic risk factors 
with other common disease traits and phenotypes, but not with 
those we included in our analysis. The pleiotropic approach we 
used is necessarily limited by the number of disease traits and 
phenotypes that have been examined with enough statistical power 
to identify GWAS hits. It is possible that disease traits and 
phenotypes with biologic pathways similar to ER— BC have not 
been examined adequately and are yet to be included in the 
NHGRI database. 

We are aware of several limitations that this work might present: 
first, we were not able to include all the SNPs from the catalogue 
because 113 (3.6% of the total selected SNPs) variants were 
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